117 research outputs found

    Considering Transposable Element Diversification in De Novo Annotation Approaches

    Get PDF
    Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species

    High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development

    Get PDF
    Using the latest sequencing and optical mapping technologies, we have produced a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. Repeat sequences, which represented over half of the assembly, provided an unprecedented opportunity to investigate the uncharacterized regions of a tree genome; we identified a new hyper-repetitive retrotransposon sequence that was over-represented in heterochromatic regions and estimated that a major burst of different transposable elements (TEs) occurred 21 million years ago. Notably, the timing of this TE burst coincided with the uplift of the Tian Shan mountains, which is thought to be the center of the location where the apple originated, suggesting that TEs and associated processes may have contributed to the diversification of the apple ancestor and possibly to its divergence from pear. Finally, genome-wide DNA methylation data suggest that epigenetic marks may contribute to agronomically relevant aspects, such as apple fruit development

    Linkage disequilibrium in young genetically isolated Dutch population

    Get PDF
    The design and feasibility of genetic studies of complex diseases are critically dependent on the extent and distribution of linkage disequilibrium (LD) across the genome and between different populations. We have examined genomewide and region-specific LD in a young genetically isolated population identified in the Netherlands by genotyping approximately 800 Short Tandem Repeat markers distributed genomewide across 58 individuals. Several regions were an

    Correlation of LNCR rasiRNAs Expression with Heterochromatin Formation during Development of the Holocentric Insect Spodoptera frugiperda

    Get PDF
    Repeat-associated small interfering RNAs (rasiRNAs) are derived from various genomic repetitive elements and ensure genomic stability by silencing endogenous transposable elements. Here we describe a novel subset of 46 rasiRNAs named LNCR rasiRNAs due to their homology with one long non-coding RNA (LNCR) of Spodoptera frugiperda. LNCR operates as the intermediate of an unclassified transposable element (TE-LNCR). TE-LNCR is a very invasive transposable element, present in high copy numbers in the S. frugiperda genome. LNCR rasiRNAs are single-stranded RNAs without a prominent nucleotide motif, which are organized in two distinct, strand-specific clusters. The expression of LNCR and LNCR rasiRNAs is developmentally regulated. Formation of heterochromatin in the genomic region where three copies of the TE-LNCR are embedded was followed by chromatin immunoprecipitation (ChIP) and we observed this chromatin undergo dynamic changes during development. In summary, increased LNCR expression in certain developmental stages is followed by the appearance of a variety of LNCR rasiRNAs which appears to correlate with subsequent accumulation of a heterochromatic histone mark and silencing of the genomic region with TE-LNCR. These results support the notion that a repeat-associated small interfering RNA pathway is linked to heterochromatin formation and/or maintenance during development to establish repression of the TE-LNCR transposable element. This study provides insights into the rasiRNA silencing pathway and its role in the formation of fluctuating heterochromatin during the development of one holocentric organism

    A high-quality sequence of Rosa chinensis to elucidate genome structure and ornamental traits

    Get PDF
    Rose is the worlds most important ornamental plant with economic, cultural and symbolic value. Roses are cultivated worldwide and sold as garden roses, cut flowers and potted plants. Rose has a complex genome with high heterozygosity and various ploidy levels. Our objectives were (i) to develop the first high-quality reference genome sequence for the genus Rosa by sequencing a doubled haploid, combining long and short read sequencing, and anchoring to a high-density genetic map and (ii) to study the genome structure and the genetic basis of major ornamental traits. We produced a haploid rose line from R. chinensis "Old Blush" and generated the first rose genome sequence at the pseudo-molecule scale (512 Mbp with N50 of 3.4 Mb and L75 of 97). The sequence was validated using high-density diploid and tetraploid genetic maps. We delineated hallmark chromosomal features including the pericentromeric regions through annotation of TE families and positioned centromeric repeats using FISH. Genetic diversity was analysed by resequencing eight Rosa species. Combining genetic and genomic approaches, we identified potential genetic regulators of key ornamental traits, including prickle density and number of flower petals. A rose APETALA2 homologue is proposed to be the major regulator of petals number in rose. This reference sequence is an important resource for studying polyploidisation, meiosis and developmental processes as we demonstrated for flower and prickle development. This reference sequence will also accelerate breeding through the development of molecular markers linked to traits, the identification of the genes underlying them and the exploitation of synteny across Rosaceae

    Novel transposable elements from Anopheles gambiae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transposable elements (TEs) are DNA sequences, present in the genome of most eukaryotic organisms that hold the key characteristic of being able to mobilize and increase their copy number within chromosomes. These elements are important for eukaryotic genome structure and evolution and lately have been considered as potential drivers for introducing transgenes into pathogen-transmitting insects as a means to control vector-borne diseases. The aim of this work was to catalog the diversity and abundance of TEs within the <it>Anopheles gambiae </it>genome using the PILER tool and to consolidate a database in the form of a hyperlinked spreadsheet containing detailed and readily available information about the TEs present in the genome of <it>An. gambiae</it>.</p> <p>Results</p> <p>Here we present the spreadsheet named AnoTExcel that constitutes a database with detailed information on most of the repetitive elements present in the genome of the mosquito. Despite previous work on this topic, our approach permitted the identification and characterization both of previously described and novel TEs that are further described in detailed.</p> <p>Conclusions</p> <p>Identification and characterization of TEs in a given genome is important as a way to understand the diversity and evolution of the whole set of TEs present in a given species. This work contributes to a better understanding of the landscape of TEs present in the mosquito genome. It also presents a novel platform for the identification, analysis, and characterization of TEs on sequenced genomes.</p

    Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies.</p> <p>Results</p> <p>We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>ΞΊ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>Ξ± </it>β‰₯ 95%, <it>E </it>≀ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives.</p> <p>Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96).</p> <p>Conclusion</p> <p>Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p

    Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

    Get PDF
    Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo β€œclouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed β€œelement-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed

    Meeting the Challenges Facing Wheat Production The Strategic Research Agenda of the Global Wheat Initiative

    Get PDF
    Wheat occupies a special role in global food security since, in addition to providing 20% of our carbohydrates and protein, almost 25% of the global production is traded internationally. The importance of wheat for food security was recognised by the Chief Agricultural Scientists of the G20 group of countries when they endorsed the establishment of the Wheat Initiative in 2011. The Wheat Initiative was tasked with supporting the wheat research community by facilitating col-laboration, information and resource sharing and helping to build the capacity to address chal-lenges facing production in an increasingly variable environment. Many countries invest in wheat research. Innovations in wheat breeding and agronomy have delivered enormous gains over the past few decades, with the average global yield increasing from just over 1 tonne per hectare in the early 1960s to around 3.5 tonnes in the past decade. These gains are threatened by climate change, the rapidly rising financial and environmental costs of fertilizer, and pesticides, combined with declines in water availability for irrigation in many regions. The international wheat research community has worked to identify major opportunities to help ensure that global wheat pro-duction can meet demand. The outcomes of these discussions are presented in this paper

    Sequencing of Pooled DNA Samples (Pool-Seq) Uncovers Complex Dynamics of Transposable Element Insertions in Drosophila melanogaster

    Get PDF
    Transposable elements (TEs) are mobile genetic elements that parasitize genomes by semi-autonomously increasing their own copy number within the host genome. While TEs are important for genome evolution, appropriate methods for performing unbiased genome-wide surveys of TE variation in natural populations have been lacking. Here, we describe a novel and cost-effective approach for estimating population frequencies of TE insertions using paired-end Illumina reads from a pooled population sample. Importantly, the method treats insertions present in and absent from the reference genome identically, allowing unbiased TE population frequency estimates. We apply this method to data from a natural Drosophila melanogaster population from Portugal. Consistent with previous reports, we show that low recombining genomic regions harbor more TE insertions and maintain insertions at higher frequencies than do high recombining regions. We conservatively estimate that there are almost twice as many β€œnovel” TE insertion sites as sites known from the reference sequence in our population sample (6,824 novel versus 3,639 reference sites, with on average a 31-fold coverage per insertion site). Different families of transposable elements show large differences in their insertion densities and population frequencies. Our analyses suggest that the history of TE activity significantly contributes to this pattern, with recently active families segregating at lower frequencies than those active in the more distant past. Finally, using our high-resolution TE abundance measurements, we identified 13 candidate positively selected TE insertions based on their high population frequencies and on low Tajima's D values in their neighborhoods
    • …
    corecore